Estimating Frequency Moments of Data Streams Using Random Linear Combinations
نویسنده
چکیده
The problem of estimating the k frequency moment Fk for any nonnegative k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1, 2]. The space complexity of their algorithm is Õ(n1− 1 k ). For k > 2, their technique does not apply to data streams with arbitrary insertions and deletions. In this paper, we present an algorithm for estimating Fk for k > 2, over general update streams whose space complexity is Õ(n 1 k−1 ) and time complexity of processing each stream update is Õ(1). Recently, an algorithm for estimating Fk over general update streams with similar space complexity has been published by Coppersmith and Kumar [7]. Our technique is, (a) basically different from the technique used by [7], (b) is simpler and symmetric, and, (c) is significantly more efficient in terms of the time required to process a stream update (Õ(1) compared with Õ(n 1 k−1 )).
منابع مشابه
Better Bounds for Frequency Moments in Random-Order Streams
Estimating frequency moments of data streams is a very well studied problem [1–3,9,12] and tight bounds are known on the amount of space that is necessary and sufficient when the stream is adversarially ordered. Recently, motivated by various practical considerations and applications in learning and statistics, there has been growing interest into studying streams that are randomly ordered [3,4...
متن کاملA Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting
Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...
متن کاملEntropy Estimations Using Correlated Symmetric Stable Random Projections
Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is...
متن کاملRevisiting Frequency Moment Estimation in Random Order Streams
We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments Fp for 0 < p < 2 of an underlying n-dimensional vector presented as a sequence of additive updates in a stream. It is well-known that using p-stable distributions one can approximate any of these moments up to a multiplicative (1 + )-factor using O( −2 log n) bits of space, and...
متن کاملEstimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...
متن کامل